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@ A synchronization service for a (SstrfltKtted operating system or the llk& 



@ A synciironization service which can be incor- 
porated Into a distributed operating system as a 
shared service. It allows the reafization of different 
custom-built synchronization strategies for different 
applications. This approach is based on defining a 
general set of application-independent synchroniza- 
tion primitives. These are provided by the distributed 
operating system in the form of a synchronization 
service. By themselves the individual primities are 
insuffient to provide synchronization. However, they 
can be combined in different ways to realize cus- 
tomized synchronization strategies. This leaves the 
ultimate responsibility for synchronization with the 

a application, but In a much simplified form. Applica- 
tion programs can combine these primitives to con- 
^struct the most suitable fonn of synchronization. 
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A SYNCHRONIZATION SERVICE FOR A DISTRIBUTE OPERATING SYSTS^ OR THE UKE 



This Invention relates generally to the field of 
CDiTTputers. and more specfficafly to a synchroniza- 
tion sennce for use with computers. 



Background of the Invention 

As computing tasks Increase in size and com- 
plexity, one approach to speed up the execution of 
these tasks Is to use distributed programs. A dis- 
tributed program can be defined as a computer 
program which is partrtioned into multiple concur- 
rent components which execute on separate pro- 
cessing sites which do not share a common mem- 
ory. 

In this context, the temn "program" Is used to 
imply a global objective (Le. common goal). Each 
component (or portion) of the program perfomns 
some portion of the overall activity requlrsd to 
attain this common goal. Thus, a distributed pro- 
gram represents a set of (functionally) tightly-coup- 
led components operating in a (physically) loosely- 
coupled environment 

Asynchronous operation of concunrent coop- 
erating acttvifes results in the time-dependencies 
and race conditions which can lead to errors. For 
example,, two processes attempting to simulta- 
neously update a shared variable may interfere 
with each other so that an incorect value Is as- 
signed to the variable. The solution to such prob- 
lems is through synchronization. Synchronization 
can be defined as the organization of actions and 
Interactions of a system of concurrent asynchro- 
nous entities for the purpose of achieving some 
common objective. 

One example of a distributed computer system 
is the case of a replicated database where each 
copy of the database is on a separate processing 
element When a change is made to one copy then 
this change must be propagated to all the otiiers If 
consistency is to be maintained. This involves syn- 
chronization. The situation may be complicated fur- 
ther if two or more conflicting changes are initiated 
simultaneously on different copies. In that case 
synchronization is required not only to ensure that 
ail copies end up in the same state but also that 
the resulting state Is valid. Other situations where 
synchronization Is necessary include the restoration 
of the current state to new or recovering copies 
and the handling of failures. 



Distributed synchronlsrartlon can also be useful 
In standby schemes where redundant components 
are configured for greater system availability. In 
this case tiie compnoents have to agree as to 

6 which will be tiie active and which the standby 
components and must also arrange for proper 
switchover in case of failures. 

The cited examples illustrate the diverse ways 
in which syrKshronlzation is used in distributed sys- 

10 tems. As can be expected, different applications 
can have different demands on synchronization: 
some may require fast response while others may 
place more emphasis on reliability and fault toler- 
ance. This indicates tiiat the choice of tfte most 

16 suitable synchronization technique and tis imple- 
merttation can only be made if the particular needs 
of the application are considered. 

Unfortunately, In a large system supporting 
many different types of distributed application pro- 

20 grams, leaving synchronization entirely to ttte ap- 
plication program could result in excessive duplica- 
tion of effort unreliable design, and suboptimal 
utilization of resources. Even worse, perhaps, is tiie 
possibility tiiat ttie relatively complex issue of syn- 

25 chronization could dominate the design to such an 
extent that functional concerns are neglected. From 
that point of view a trusted, system-based, syn- 
chronization facility is prefenred. 

There are several important characteristics of 

00 distributed programs which make them signiticantiy 
more difficult to design and implement compared 
to conventional non-distributed programs: 

(1) Concunrent execution. This means that 
tiiero Is no single sequential control tfiread such as 

35 represented by the execution t^ce of a non-distiib- 
uted program. Concurrency introduces timing de- 
pendencies among the system components which 
can lead to deadlocks or instability. 

(2) Significant communication delays. The 
40 exchange of information between components of a 

distributed program involves non-negligible and 
randomly-distributed transmission delays. If tiiese 
delays are comparable to tiie rate at whteh the 
components change state, tiie system may be- 
45 come unstable. 

(3) Partial failure modes. Failures of disb-ib- 
uted components require complex detecti(^ and 
recovery algoritiims which are difficult to design 
and verify. Two types of partial failures exist 

50 - Communication path failures can result In the 
duplication, temporal reordering, or total of 
information being exchanged; and 
-Processing component failures (hardware and soft- 
ware) lead to temporary loss of functionality. 
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The recovering action for each type of faHure is 
quite different. Unfortunately, ft is often difficult to 
distinguish them on the basis of the observed 
symptoms. 

From the definition of synchnonizatfon it cot be 
seen that the need for synchronization Is deter- 
mined by the shared objective of the cooperating 
distributed entities. This common objective places 
interdependencles on the individual entities so that 
a change In the state of one necessitates appro- 
priate changes (reactions) in others. This can be 
©(pressed as a requirement to preserve certain 
application-dependent state consistency con- 
straints. The problem of maintaining consistency Is 
Mother complicated by the fact that each entity, in 
sidition to internal Interactions, is also exposed to 
independent interactions with the environment 
(The environment consists of other distributed 
components which do not share the same objective 
as the synchronized system, but which use it for 
their own purposes). This means that the stimulus 
to change state can occur simultaneously In two or 
more synchronized entities. The synchronization 
problem can then be viewed as one of ordering 
concurrent interdepenent activities. 

The simplest fonm of orefering which guaran- 
tees consistency Is seriaRzation: the execution of 
acthrfties one at a time. Although synchronization 
strategies exist which are not based on serializa- 
tion, they will not be considered here due to their 
relative complexity. 

Two basic, and not necessarily exclusWe, 
classes of strategies exist for achieving serialization 
In distributed systems; 



(1) Centralized strategies. 

In this case, the ordering of activities is per- 
formed by a unique distinguished entity. Synchro- 
nized entitles, with extemally Induced work re- 
quests, first approach the distinguished entity for 
permission. This entity resolves conojrrent re- 
quests by granting a right to only one of the 
competing entitles. When that entity completes its 
work, the right is granted to another entity, and so 
on. 

A major feature of this type of s<^eme is that 
there Is a single point of control. This allows the 
implementation of relatively complex yet reliable 
and efficient scheduling algorithms. Examples of 
centralized strategies can be found in A Decentrai- 
ized Control Metiiod in a Distributed'System by 
J.P. Cabanel et al. Proceedings 1st Ceonference, 
Distributed Proc. Systems, Huntsville, Al. 1979 and 



in A F^ure Tolerant Centralized Mutual Exclusion 
Algorithm by Q. N. Buckley et al. Proceedings 4th 
Conference, Distributed Computer Systems, San 
Frandsco. Ca. 1984. 

5 

(2) Distributed strategies. 

In this case, there is no central scheduler. 

70 Instead, ordering is accomplished through distrib- 
uted agreement Key to this scheme is a shared 
"ctock" (logical or physical). This Is gwwatty a 
monotonically inoeaslng numeric variable whteh is 
maintained consistently by all the synchronized 

15 entities. Woric requests are timestamped wkh the 
clock value at tf^ time of arrivai and then pro- 
cessed in order. However; because two or more 
requests can be concurrent (!•©-» they have the 
same timestamps), ties are resolved through group 

20 negotiation: a new work request is first broadcast to 
all other entities which respond either with a simple 
acknowledgement or a woric request of their own. 
Once an entity is aware of all concurrent work 
requests within the group, It orders them according 

26 to some tIe-breakIng rule and then processes them. 
Since each entity uses the same ordering algorithm 
each will perceive the same sequence of events as 
all the others. 

The distinguishing feature of distributed strat- 

30 egies is that operation does not depend on a single 
critical entity at any time. This makes them voy 
fault-tolerant However, they are generally less effi- 
cient than centralized strategies when the mmiber 
of entities to be synchronized Is large. Examples of 

35 distributed strategies can be found in Time, Ctocks, 
and the Ordering of Events in a Distributed System 
by L Lamport Comm. AOM, Tzi.T), July 1978, In 
An Algorithm for Maintaining the Consistency of 
Multiple Copies by D. Hernian et al. Proceedings, 

40 1st Conference Distributed Proc. Systems, Hunts- 
ville, Al.. 1979 and In Synchronization in Distributed 
Programs by F.B. Schneider, ACM Transactkjns on 
Prog. Lang. & Syst, (4,2). April. 1982. 

Combinations of these two forms, such as the 

46 circulating sequencer proposed in Algorithms for 
Distributed Data Sharing Systems Which Use Tick- 
ets by G. Le Lann, Proc. 3rd Bericeley Woricshop 
on Dist Data, Aug. 1978. are possible. In that - 
scheme, a centralized controller is used to control 

50 the clock used for timestamping. (Although the 
controller fonction Is circulated among the distrib- 
uted entities, at any gh^en time it is performed by 
only one entity.) The ordering of activities is then 
done In a distributed fashion, based on timestamp 

55 values and a tie-breaking rule. 
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Th9 following pat en ts depict ^camples of dis- 
tributed processing: Inr genarali and attention Is di- 
rected ta thenn. LIS. patent! 3;4tt»t39 dated No- 
vemt}er T2, ItSSE by^ ^T, tiyndx et al; U.S. patent 
3:Bfi4fl£ dated: D bCBn rit to 20^ tSTI by Q.S. Hoff 
^. ai; US: patfflfrtt 5,771 .tST dated tMovember 8. 
1973 by RF.R Hamer et ai; and U.S. patent 
4.115,888 dated September 19, 1978 by J.LG. 
Janssens etaL 



Summary ofth»lnventtoTT 

One^objsctiver of the Syndironlzation Ser>dce of 
ths prsssit invention is ta provide a set of 
appflcatian-indspendent capabS&ies which would 
allow tfT& construction of spedfic synchronization 
strategic belonging to the categories listed above. 
To do this it must incorporate th& essential abstract 
features of those strategies. These are defined in 
the form of a general synchronization paradigm 
described in a following section. 

Because of concunent execution and the pos- 
sibinty of partial failures, It is net^ssary to closely 
synchronize thaoperation of the disbibuted compo- 
nents of a proq^Bmu Synchronization can be de- 
finect as thff ordering: of actnarts and interactions of 
campanents? irr & di^'tsutad program so that the 
state of each component remains consistent with 
the common goat. 

Experience with concurrent systems has shown 
that the synchronization problem is difficult to solve 
even for non-distributed: situations; the number of 
possible component interactions is usually very 
large, increasing the probability of a design error, 

A further difficulty Is caused by the fact that no 
single synchronization strategy is adequate for all 
distributed programs. If multiple distributed pro- 
grams are to be supported on a system, this 
means that the synchronization problem may have 
to be solved In many different ways. 

Given the diversity of synchronization strat- 
egies and the difficulty of implementing them, is it 
possible to provide some assistance to designers 
of distributed pn^rams to Increase the reliability of 
their designs? 

The approach to this problem, presented by 
the present invention, consists of providing a set of 
primitive synchronization operators at the level of a 
distributed operating system. Such operators can 
be used to construct more complex forms of syn- 
chronization customized to different applications. 
This approach has the following advantages: 
-It provides a one-time tmsted implementation of 
common mechanisms; 

-It does not favour any particular synchronization 
strategy which would favour some applications but 



penalize others; 

-h provides a systematic framework (programming 
model) for designing and implementing distributed 
programs. 

5 The operating system component whldi Imple- 
ments the synchronization primitives (operators) is 
called the Synchronization Service. 

The essential Idea behind the Synchronization 
Service Is that the synchronization problem can be 

70 tadded hierarchically. Each level in the hierarchy 
may have different synchronization mechanisms 
based on the synchronization facilities of the levels 
below. The lower levels of this hierarchy can be 
designed to be application-independent and can 

75 therefore be provided as a reliable system service. 
This, in turn, increa^s the rellabiilty of programs 
and reduces development time. 

This approach to distributed synchronization 
attempts to decompose the synchronization prob- 

20 lem. At the lowest level of decomposition a general 
set of application-independent synchronization 
prtmitlves is defined. These are provided by the 
distributed operating system in the form of a syn- 
chronization service 10. By themselves the primi- 

25 fives are insufficient to provide synchronization. 
However, they can be combined in different ways 
to realize customized synchronization strategies. 
This leaves the ultimate responsibility for synchro- 
nization with the application program, but in a 

30 mudi simplified form. The role of the synchroniza- 
tion service 10 is to hide many of the more basic 
housekeeping functions inherent In distributed syn- 
chronization. f=6r instance, all fault-tolerant synchro- 
nizatibn schemes require a monitoring function to 

^5 keep track of the operational status of all relevant 
distributed components. The present invention con- 
solidates such a function as a system service 
where it can be shared by many application pro- 
grams. 

40 Stated In other terms, the present invention is a 
general service, provided within a distributed op- 
erating system, which can be used by application 
and system programs to implement synchronrza- 
tlon between program components that are phys- 

45 ically distributed. 

Stated In other terms, the present invention is a 
synchronization service for use with a computer 
having a distributed operation system, to allow the 
construction of a customized synchronization - 

50 scheme, for synchronizing the constituent portions 
of a distributed program, the service comprising: a 
general set of application-independent synchroniza- 
tion primitives, whereby the construction of the 
customized synchronization scheme Is achieved by 

55 the selective implementation of the application-In- 
dependent synchronization primitives. 
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Slat^ in yet other terms, the present invention 
Is a synchronization service for use with a com- 
puter having m oiMr^ng system distributed over a 
plurality of pnasK^ng efsments, to allow the con- 
struction ofas oj^nmisKfe ssFndironialion scheme, 
for synchronizing- the^ constituent a)mponents of a 
distributed program, the service comprising the 
steps of: 

a) joining a program component on a fist 
processing element to a group of existing program 
components on et least a second prowsslng ele- 
ment so that Mch of the existing comfwnents is 
aware of the- present and location of the joining 
components; 

b) Infonniprg ^ch member of the group of 
physically distributed program components when 
one or more components which are members of 
the group, depart from It; 

c) selecting, as a distinguished member, one 
program component from a group of distributed 
program components such that, within the group, 
there Is never more than one distinguished mem- 
ber; and 

d) providing mutually exclush^e rights to the 
group of distributed program components such that 
no more than one component can appropriate a 
given right at any time. 

Stated in still other terms the present Invention 
is a synchronMon service, for use with a com- 
puter having an operating system distributed over a 
plurality of processing elements, to allow the con- 
struction of customized synchronization schemes 
for synchronizing the constituent components of a 
distributed program, the service including a syn- 
chronization master corrtroi comprising: master 
control means for activating the synchronization 
service; polling means for polling the processing 
elemente associated with the components of the 
distributed program so as to monitor the stetus of 
the processing elements; control means for Joining 
new members to the group, and for handling de- 
partures of members from the group; and a 
database means containing information representa- 
tive of the current state of the synchronization 
service at a given point in time. 



Brief Description of the Drawings 

The present invention will now be described in 
more detail with reference to the accompany draw- 
ings, wherein like parts in each of the several 
figures are identified by the same reference char- 
acter, and wherein: 

Figure 1 depicts a simplified blodc diagram 
of the synchronization service of the present inven- 
tion; 



Figure 2a is similar to Rgure 1 but is for one 
specific embodiment thereof; 

figure 2b is a variation on the embodiment 
of Figure 2a; 

5 Rgure 2c Is similar to Figure 2b; 

Rgure 3a is a chart depicting the primlth^es 
and conresponding replies employed by the Inven- 
tion; 

Rgure 3b Is a symbolic representation of the 
TO constituent taslcs of synchronization master control 
of Flgtare 1; 

Rgure ^ is a symlwlic representation of the 
constituent tasks of member agent 1 1 of Rgure 1; 

Rgure 4 is a simplified functional flow dia- 
75 gram for a database; 

Rguro 5 is a simplified functional flow dia- 
gram for a database; 

Rgures 6 to 8, 9a. 9b. and 10 to 13 inclusive 
represent action sequences helpful for understand^ 
20 ing the operation of the present Invention; and 

Rgure 14 Is a simplified representation of 
the useage dependencies helpful in understanding 
the operation of the present invention. 

25 

Detailed Description 

Synchronization service 10 Is based on a gen- 
oral distributed program paradigm. This paradigm 

30 is represented by the concept of synchronization 
groups. A synchronization group is a set of distrib- 
uted program components called "members", and 
referred to by the reference character 18, which 
cooperate to achieve a common objective. Note 

36 that members 18 aro not a part of synchronization 
service 10. but they use synchronization servi(» 
10. 

In ottier words, the distributed operating sys- 
tem 15. to which synchronization service 10 is 

40 applied, will support both distributed application 
programs and distributed system programs. Bottr 
the distributed application and system programs 
consist of several program components {called 
members 18) which in tum consist of subcom- 

46 ponente called tasks. In synchronization service 10 
there Is one synchronization group for each distrib- 
uted application or system program. 

A primitive synchronization operator has effect 
only within the domain of a particular synchroniza- 

50 tion group. Synchronization groups, tfierefore. en- 
capsulate unite of tightiy coupled distributed func- 
tionality. Of course, synchronization service 10 al- 
lows many 'synchronization group to coexist on a 
single distributed operating system 15. 

65 The basic cortstiiict of synchronization service 
10 is the syndironization group representing a set 
(i.e.. a system) of distributed entities which are 
tightly coupled to each other in some way. The 
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stats and action dependencies whtdr bind thesa 
entities are not spedfied at this Isvei sa ttut syn- 
cfrronization ^x)ups ara dgooppigcfc from, appltei^ovt 
^ntantics^ 

Fbi^nafiyv s synishronlz^ijair ^uuir Is & si^ of 5 
components, called members fS, In which: radx 
group ideally has the following properties: 

(1) Uniqueness: There can be any number of 
synchronization groups In a larger system but each 
synchronization group is distinguished fram all oth- 10 
ers by a unique syndironization gruop Identifier. 

(2> Physlcat distribution: Each member of a 
^fnchronizationi group exists on a cBfferent pso* 
cessing element 1Z (This is simply a matter of 
cotnrenfemss: extending th& concept of synchro* rs 
rnzation: groups to logically dstributed entHies is 
possibley. Note that there are no restrictions con- 
cerning the number of synchronization groups 
which may have members 18 on a particular pro- 
cessing element 12. TTiis means that two or more 20 
synchronization groups can . overlap in physical 
space. 

(3) Reliable communication: Communication 
between any pair of members 18 Is non-loi3syt non- 
duplicating, and order-preserving. Furthermore, M zs 
connectivity is assumed; each member IS can 
comntunlGatB cfirectty wrlh alF other members ia If 
the ptrsfsicait system do^ rot have* 1S!tes9 properties 
then it Is assumed that an underlying communica- 
tion service existe which provides them. The Intent 30 
here is to isolate communications issues from syn- 
chronization issues. 

(4> Dynamic behavior Members tS can de- 
part or join the synchronization group at any time 
and independently of each other. (The group exists 35 
as long as at least one member 18 exists.) Depar- 
tures may be either appGcation-driven or due to 
processing element 12 failure. This property cap- 
tures tiie dynamic nature of real-world components. 

(5) Mutual exclusion: Each synchronization 40 
group maintains a set of shared objects called 
rights, each of which can be elttier free or asso- 
ciated with at most one memt)er 18. They are 
functionally equivalent to semaphores (reference: 
EW. Dijkstra, Cooperatino Sequential Pror<m??. 45 
Technical Report EWD-123, Technological Univer-' 
sfty, Qndhoven, 1965) but for a distributed environ- 
ment (However, a member 18 can hold more than 

one right at a time.) A departing member 18 cannot 
abscond with a right since any rights it holds are so 
automatically freed, in essence, rights are a gen- 
eral mechanism for distinguishing between group 
members 18. The assignment of functional signifi- 
cance to rights is up to tiie application. 

(6) Distinguished member: One and only one S6 
member 18 of every synchronization group Is des- 
ignated as its distinguished member. The appoint- 
ment is made at random and is transferred to 



anotiier memtjer 18 if ttie cunent distinguished 
member t8 departs. This property is intended to 
serve tht^ synchronization strategies which re- 
quire a central ^>ordInator although synchronization 
service 10 makes no assumptions regarding the 
functional significance of the distinguished member 
18. (Note that the distinguished member feature is 
simply a special case of the mutual exciuslon prop- 
erty but has been singled out purely for conve- 
nience.) Since ^e selection and preservation of a 
distinguished member 18 is by synchronization 
service 10. application programs need not imple- 
ment their own election algorttiims. 

A synchronization group represents a unit of 
synchronization. The facilities of ttie synchroniza- 
tion service 10 (described later) are all limited in 
scope to the respective synchronization group. 

The synchronization problem is often fonmu- 
tated as a problem of maintaining data consistency 
in a dynamic environment. From that point of view, 
the synchronization service 10 ensures consistency 
of tiie foiowing information sent to members 18: 

(1) current membership list; 

(2) the identity of the distinguished member; 

and 

(3) the status of all group rights. 

This Information is maintained conslstentty and 
correctiy In the face of continual departures and 
anivals of members 18. 

The con^pt of synchronization groups does 
not encompass application program-level consis- 
tency; that is the responsibility of the application 
program. Instead, a synchronization group main- 
tains a consistent view (on all its members 18) 
conceming the status of its objects: the list of 
active members 18, ttie status of rights, the distin- 
guished member designation. These responsibil- 
ities are therefore removed from tiie view of the 
application program. 

Figure 1 depicts a simplified block diagram "of 
synchronization service 10 of the present invention. 
A distributed application program, structured as a 
synchronization group, typically has members 18 
0;e. distributed program components) which are 
physically distributed across two or more process- 
ing elements 12a...12n (referred to collectively as 
processing elements 12). In the implementation of 
Rgure 1, the structure of synchronization service 
10 matches the stiiicture of the synchronization 
group by providing a local synchronization control- 
ler, i.e. member agent 11, for each group member 
18. Thus, there is a separate implementation, of 
synchronization service 10 for each application pro- 
gram; note, however, that there Is only one syn- 
dvontzation master control 13 regardless of how 
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many Implenrenlaliana^ and only ons synchroniza- 
tion agent 14 per praassing element 12. Each 
Implemerrtatian is functtonaUy independant of tfre 
others. 

^krtB•thirthlff^xaxpra^p^o^8Esrf^ff^ta^^sprts t2 5 
together with proc^ng eiement ia foam part of 
the distributed computing environment (i.e. distrib- 
uted operating system 15) which synchronization 
service 10 is designed to synchronize. Not also 
that each processing eiement 12 may have a plu- io 
raiity of member agents It, and that processing 
element 19 may be casmbined with one of the 
processing elements 12. 

Member agents 11 provide the main interface 
to the synchranizatlaff sefvtoe 1XL Application pro- rs 
gram components (l.e. menrtbers 1© initiate syrv 
chronteation activitiee by Involdng the desired syn- 
chronization primitives (to be described later). This 
Is communicated to the local member agent It 
which tfwn interacts with other member agents It 20 
In order to effect the specified synchronization 
function. The member agent 11 also Infbnms the 
members 18 of syndironlzation requests Initiated 
by other membws 1 a as well as group evmts such 
as the failure of active members 18 and the Joining 25 
of new ones. 

Member ^ents It are dynaamic entities which 
follow the dyrramics of thtt appHcatiorr programs 
they serve. A mwnijer agwit 11 is created (try the 
local synchronization ^ent 14) when an application so 
program component (i.e. member 18) requests to 
be synchronized with other memtoers 18 In a syn- 
chronization group. It is destroyed when the mem- 
ber 18 is unsynchronized. 

To ensure coherent behaviour of synchronize- as 
tion service 10, control of the individual implemen- 
tations of the senrtce 10 is centralized. This is done 
through a three-ievel hierardty with a unique mas- 
ter controller at the top (i.e. synchronization master 
control 13), an intennediate layer of controllers In 40 
the middle (i.e. synchronization agents 14a to 14n. 
refened to collectively as synchronization agents 
14), and a layer of member agents 11 at the 
bottom. This hierarchy allows a decomposition of 
the control problem into smdler more comf^ehen- 46 
sive subproblems. Note from Figure 1 that there is 
one synchronization agent 14 for each processing 
element 12. and it controls all the member agents 
11 in that processing element 12. 

Rgure 2a is similar to Rgure 1, but depicts a so 
specific embodiment of the synchronization ser- 
vice, referred to by reference character 100 as 
applied to distributed operating system 115. In 
i^gure 2a there Is a synchronization master control 
13 on processing element 119^ three processing 55 
elements 112a, 11 2b, and 1t2c, three synchroniza- 
tion agents 14a, 14b. and 14c, along with six mem- 
bers 18a to 18f along with their corresponding 



member agents 11a to 11f respectively. In the 
distributed computing example of Rgure 2a, pro- 
cessing elements 112a. 112b. 1l2c and 119 are 
each an iBM PC-AT. Note that the members 18a to 
tSf inclusive are not part of synchronization service 
too while everything else shown in Rgure 2a Is. 
Members 18a to 18f Inclusive use the synchroniza- 
tion senrice 100. Note also that there is another 
synchronization master control (not shown) on stan- 
dby. 

Rgure 2b Is similar to Rgure 2a, but Is further 
simplified and depicts only those items that con- 
stitute one implementation of synchronization ser- 
vice 100 (l.e. Implementation 100a). That is, mem- 
bers 18a and 18e (Rgures 2a and 2b) fwm one 
synchronized group. Members 18b, 18c. 18d. and 
18f (Rgure 2a) fonn at least one other synchro- 
nized group. 

Rgure 2c is a simplified application to exem- 
plify synchronization service 100a of Rguro 2b. In 
Rgure 2c the hardware implementating synchro- 
nization service 100a Is a group of IBM personai 
computers of the AT series, linked by an IBM LAN 
(local area networic) 226. That is, processing ele- 
ment 112a Is an IBM PC-AT computer 212a, pro- 
cessing element 112b is an IBM PC-AT computer 
212b, and processing element 119 Is an IBM PC- 
AT computer 219. 

In Rgure 2c. computer 212a is a telephone 
operator's wortcststion as is computer 212b. The 
application In Rgure 2c is to maintain a telephone 
directory and to allow the user at both computers 
212a and 212b- to have access to the telephone 
directory, to access It to determine an individual's 
telephone number, and to be atHe to update the 
telephone directory as changes occur. Computer 
219. In this example, handles the tasks of synchro- 
nization master control 13 and database 16 (Rg. 
2b) 

Returning now to the general case of Rgure 1. 
the role of synchronization master control 13 Is to 
provide intemal synchronization between tfie com- 
ponents of the local synchronization service 10. In 
essence, it performs those functions where a con- 
sistem (but not necessarily correct) view of ttie 
system 15 Is required. More precisely, synchro- 
nization master control 13 is responsible fon 



(1) Activation of synchronization service 10. 

This is done by activating the synchronization 
agents 14 as the processing elements 12 aro re- 
started. 
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(2) Monitoring of processing elements t2. 

This function rnvohres obsennng (pQlIIng> tfte 
status of al) proces^ng elements f 2 by commu- 
ntoatfng wrth Focal synchronizatfon agents t4. Any 
changes \rt these states are detected b7 synchro^ 
nization master control 13 and appropriate notifica- 
tions are dispatched to the synchronization service 
components affected by the diange. 

(3) Management of syre^rontzation groups, 

synchronization master control 13 is the central 
arbiter for all synchronizatiorT groups In tha local 
synchronization service 10. ft is Invoh^ In han- 
dling transient conditions which occur In group 
operation: 

-group establishment 
-joining of new members 18^ and 
-departures of Joined members 18. 
Note that synchronization master control 13 does 
not participate In the steady-state operation of syn- 
chronization groups and, consequently. Is not nor- 
mally a performance bottleneck. 

Synchronization m^ter control 13 must be 
highly fault-tolerant since synchronization service 
10 may be used to rmpfement standb/ schemes by 
applications. For that reason it Is backed up by at 
least one other . instance operating in standby 
mode. If the currently active synchronization mas- 
ter control 13 fails, the standby wtH take its place. 
Because this is the Synchronization ServicOt the 
selection of an active synchronization master con- 
trol 13 from the set of Instances must be done 
through an internal agreement (election). This Is the 
only place In the entire system where the synchro- 
nization service 10 cannot be used for such a 
purpose. However, in tills case, the problem occurs 
in a very specific context and can be solved in a 
specific way (for example, by using a bully al- 
gorithm for a distributed election as described in 
Elections in a Distributed Computing System by H. 
Garcia-Molina, IEEE Trans, on Computers, (C- 
31.1). Jan. 1982). 

Once the active sync^nlzation master control 
13 has been selected, the standby resorts to a 
monitoring mode in which it periodically polls the 
active instance until a failure is detected. 

Since a standby Is used, following a failure of 
the synchronization master control 13. its previous 
state must be reconstructed on the standby, prefer- 
ably without Involving the application program. This 
can be achieved through the information kept by 
the synchronization agents 14. As a consequence. 
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except for slighrtly extended service times due to 
the recovering process, application programs are 
unaware of synchronlzaflon master control 13 fail- 
ure. 

5 

SYNC MASTER TASK 

The Sync Master Task 20 Is the root task (i.e. 

10 program) of the synchronization service 10 control 
hierarchy. It provides the central control point for all 
synchronization groups. It consists of four main 
subcomponents as depicted in Rgure 3b and is 
k3cated within synchronization master control 13. 

re The four main subcomponents of the Sync Master 
Task 20 are as follows: 

SYNC MASTER CONTROL 21 estabfishes and 
maintains the operational state of the Sync Master 
Task 20. This includes tiie Sync Master recovery 

20 algorithm. Sync Master Controi 21 consists of the 
main procedure of the Sync Master Task 20, 

POLUNQ CONTROL 22 is responsible for de- 
tecting failure of processing elements 1^ This sub- 
component sends periodic messages to all syn- 

25 chronization agents 14. if a reply Is not received 
wWiin a certain time Interval (after several retries 
have been attempted) the corresponding process- 
ing element 12 is declared as failed and a notifica- 
tion is sent to all remaining synchronization agents 

30 14. This subcomponent is Implemented within the 
Sync Master Task 20. 

SYNC AGENT CONTROL 23 deals with events 
which occur at the processing element 12 level. 
This subcomponent is responsible for activating 

35 newly-recovered synchronization agents 14 as well 
as for accepting notifications, from the synchroniza- 
tion agents 14. about the arrivals and departures of 
group members ia These are then relayed to the 
appropriate Group Control 24. This subcomponent 

40 is also implemented within the Sync Master Task 
20. 

GROUP CONTROL 24 handles events which 
are relevant to one group. This includes the ioining 
and departure of group members 18. The Group 

45. Control function is implemented by the Group Mas- 
ter Task 25. There is one such task 25 for each 
synchronization group. Tasks 25 are created dy- 
namically by the Sync Master Task 20. 

The tasks comprising the Sync Master Task 20 

so maintain a shared database 16 (Figure 1) which 
represents a snapshot of the current state of the 
synchronization service 10. This database is de- 
soibed later. 

65 
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SYNCHRONIZATION AGBff 

A synchronization agmtt 14 r^d^ in the con- 
trot program of each processing element 12 which 
reciufres synchronlzatbn ssnscs T9 £StcI It is the 
sole representative of the Sync M^aster Tasic 20 in 
that processing element 12. The synchronization 
agent 14 has the following responsibilities: 
-It accepts SYNCHRONIZE directives and creates 
corresponding member agents 11. 
-H monitors the status of all active member agents 
11 on its processing element t2 and detects their 
disappearance (spontaneous or planned). 
-It notifies the synchronization master control 13 of 
ail changes (arrivals and departures) of Member 
Agents 11 on Its processing element 12. 
The synchronization agent 14 is implemented by 
the Sync Agent Task which is part of the operating 
system 15 on the conrespondlng processing ele- 
ment 12, 

The synchronization agents 14 are pennanent 
representatives of syndwonization master control 
13 within their host processing element 12. They 
have three main purposes: 

(1) Synchronization agents 14 are a focal 
point for controlling all member agents 11 within a 
single processing element 12. This reduces the 
toad on synchronization master control 13 which 
simply sends common controi Information to syn- 
chronization agents 14 for Cistribution to local 
member agents 11. 

(2) Synchronization agents 14 Isolate mem- 
ber agents 11 from the effects of synchronization 
master control 13 failures. All communication be- 
tween the synchronization master control 13 and 
Member Agents 11 is channeled through tiie syr»- 
chronlzation agents 14. If the synchronization mas- 
ter control 13 Is temporarily unavailable (due to 
faifure), tiie synchronization agents 14 will hold 
member agent 1 1 messages destined for tiie syn- 
chronization master control 13 until tiie latter Is 
reinstated. In this way failures of the synchroniza- 
tion master control 13 are masked from member 
agents 11 and hence the applications. 

(3) Synchronization agents 14 participate In 
the recovery of tiie synchronization master control 
13. When a synchronization master control 13 Is 
being reinstated it can reconstruct its operational 
state simply by querying all ttie synchronization 
agents 14. This is much faster and more reliable 
than querying ttie member agents 1 1 since tiiese 
are more dynamic and more numerous. 

The synchronization master control 13 main- 
tains a database 16 (Rgure 1) which represents tiie 
cun-ent state of the synchronization service 10 with- 
in the system 15. The database can be arcessed 
tiirough two keys: 

-by group identifier -for access to the date for a 



particular synchronization group, and 
-by processing element Identifier -for access to 
synchronization service contponents located on a 
pfi^cular processing element 12. 
5 The basic structure used is ttie linked list of 
dynamically allocated control bk)cks, each btock 
corresopnding to some synchronization service 
component This represents a trade-off between 
the requirement to minimize storage costs and the 

w need for fast access to tiie data 

The next section describes tiie operation of the 
Internal mecfianlsms used to achieve the synchro- 
nization functions. In ttie following discusston tiie 
communication between member agents 11 Is as- 

T6 sumed to be reliable; l.e.. it Is nonHossy. non 
duplicating, and order preserving, ff the commu* 
nl^on medium Is unreliable an underlying reliable 
communication servtoe provided witiiin tiie distrltK 
uted operating system can be used. 

20 Rights are a set of shared objects within eech 
synchronization group; each right can be free or 
asslcated with at most one member 18. One exam* 
pie of a right Is a database lock whereby only one 
user at a time can write to a database and no one 

26 else can read or write at ttiat time. See also ttie 
"Update" right refenred to later. 

Rights are distributed in a centralized feshkm 
since tiiat minimizes overhead and complexity. In 
principle, this can be done by any member agent 

30 11. For convenience, tiie control and distribution of 
rights are perfonmed by tfie distinguished member 
(one of tiie membere 18). The distinguished mem- 
ber 18 already has tiie uniqueness and ^It-loler- 
ant properties which are also required by tiie con- 

35 troller for rights. Thus, the Member 18 selected as 
tiie distinguished member has to perfonn this spe- 
cial function in addition to its standanj synchroniza- 
tion functions. The selection of a distinguished 
member is done, by the synchronization master 

40 control 13, at tiie time ttie group Is established (see 
below). 

When a member 18 requires a right, its mem- 
ber agent 11 directs the request to the distin- 
guished member 18. If ttie right is available, ttie 

45 distinguished memi^r 18 will grant tiie right and 
inform ttie requesting member agent 1 1 . If ttie right 
is already appropriated, ttien depending on ttie 
type of request made, tiie request is either queued 
by the distinguished member 18 or It Is refused. In 

50 the first case, requests are handled on a first-come 
first-served basis. 

Should tiie cunrent distinguished member 18 
fail, a new one is appointed by the synchronization 
master control 13 (which is also resonslble for 

55 detecting ttie failure). Of course, until a new distin- 
guished member 18 Is appointed, rights cannot be 
distributed or retrieved, but ail the other synchro- 
nization services are still available. In order to mlni- 
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miza the effect of a distinguished member 18 fail- 
ure. th& state of rights is reconstructed to the point 
Just prfortix failure Each member 18 keeps a list of 
^1 rig^ whichr it Frss appropriated as well as a list 
of all tt$3 ouUiiufiUIn^ rights rsQuests. This informa- 5 
tion iS" thorr exchanged with the new distinguished 
member 18 which can then assume the same state 
as the previous distinguished memt)er 18. The 
entire switchover process is transparent to the ap- 
plication program. to 

If a member 18 fails, the distinguished member 
will automalicalty^ release any rights held by that 
mBmbar t& artd also purge any queued requests 
gffliefatesct by that memtier 1 8. 

Menrber agent tl Is the main functional comr is 
panettt of ayrrclu'oiuzaUon service 10 and is re- 
sponsible for handling all directives initiated by tiie 
user, it performs four classes of functions as de- 
picted In Rgure 3c and as represented by tiie 
following: 20 

rh&- COMMUNICATIONS HANDLER 33 pro- 
vides a reliable (order-preserving, non-lossy, non- 
duplucating) communications service between 
group members 18; in order to minimize deadlocks 
tha communication mode used Is asynchronous 2s 
messaQBt p^ng. This function Is required only if 
thve is no reliabfe communications service present 
wfthimthttdli^ributed operating system 15. 

Tha CTOUP STATE HANDLER 32 maintains a 
local varsforr of the current state of all the ottier 30 
group, members 18. 

The DIRECTIVE HANDLER 31 provides tiie 
interface between user tasks (components of mem* 
bars 18) and the member agent 1 1 . 

The DM HANDLER 30 Implements the distin- 35 
guished member functionality and Is active on only 
one member 18 of the group at a time. This 
member 18 Is selected by the Group Master Task 
25 (Rgure 3b). The distinguished member 18 is 
responsible for allocation of rights as well as for 4o 
broadcasting group status change notifications to 
at! other members 18 of the group. (This informa- 
tion is received from tiie Group Master Task 25.) 

Member agents 11 are created dynamically by 
the synchronization agent 14 in response to a 45 
SYNCHflONIZE directive (Primitive). They are also 
destroyed by the synchronization agent 14 after 
they have left the group or following a failure. 



Broadcasts and Acknowl^gements 

When an application program Initiates a broad- 
cast (via tiie GROUP-BROADCAST prfmitrve), its 
local member agent 11 distributes the information 
to all otfier active member agents It. It then accu- 
mulates acknowledgements until all active member 
agents 11 have replied after which the application 
program is notified (via the GRP-ACK reply signal). 

If an element 12 falls before its acknowledge- 
ment is dispatched, tiie broadcasting member 
agent 11 will assume an implicit acknowledgement 
from that member so tiiat failures will not disrupt 
the application. 



Group Establishment and Joining of New Members 

A newly Joining member 18 first informs the 
synchronization master control 13 (via its syndiro- 
nization agent 14) of ite intent to Join the synchro- 
r^zation group. The synchronization master control 
13 then determines If tiiis is tiie first reported 
member of the group. If it is, tiien this Member 18 
Is designated as the distinguished member 18 and 
a notification is sent back. This establishes the 
group, 

if the group is already established, synchro- 
nization master control 13 registers the new mem- 
ber 18 as being In the joining state and Infonms tiie 
group's distinguished member agent 11. Upon re- 
ceiving this notification the distinguished memt>6r 
agent 11 broadcasts a join request to all member 
agents 11 on the list and waits for the correspond- 
ing group acknowledgement The period between 
tiie broadcast of tiie join request and the full ac- 
knowledgement of that request by all joined mem- 
ber agents 11 Is called the Joining interval. During 
tiiat time some member agents 11 will become 
aware of the new member agent 1 1 before others. 
This opens up the possibility tiiat some messages 
broadcast within the group may bypass the par- 
tially synchronized member agent 11. If messages 
received by this member agent 11 are passed to 
the application program, then tiie application pro- 
gram function of this member 18 would not nec- 
essarily perceive the same sequence of group 
events as other members 18; it could miss some. 
Therefore, the new member agent 11 must ac- 
knowledge any messages received from other 
member agents 11 (in order to satisfy the acknowl- 
edgement requirement) but, once acknowledged, 
the messages are discarded^ i.e. they are not 
passed on to the application (an exception is mes- 
sages containing otiier joining or departure re- 
quests which are processed by tiie member agent 
11 but still not relayed to tiie application). This 
mode of operation remains in effect until the join 
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request Is ffnally acknowledged by the entire 
group. At that point, the new member 18 Infonrns 
Hst application that ft Is fully joined and switches to 
normal operstfon. TTie overall effect, as perceived 
hjp tFr© ^npHcalion, is that the joining operation is 
atomic. 

The handling of messages that were discarded 
during the joining interval Is no different to the 
application program than the handling of messages 
missed by the member 18 while It was down; that 
Is. once synchronized with the gruop, the applica- 
tion program must proceed to upgrade Its fiinc- 
tionai state to be consistent with the functional 
states of other members ia The best method for 
achieving this depends on the appQcation program. 

Departure of Members 

The departure of a member 18 from a synchro- 
nization group occurs when the member 18 de- 
cides to unsynchronize or when the host process- 
ing element 12 fails. In the former case, ttie depart- 
ing procedure is as follows: the synchronization 
group 0.e. agent 11) notifies the Sync Master Task 
20 of Its Intention. This event Is relayed, via the 
appropriate Group Master Task 25 (Rgure 3b), to 
the distinguished member 18 of the group. The 
distinguished member 18 then broadcasts this in- 
formation to all other group members 18. Note that 
there Is one Group Master Task 25 for every syn- 
chronization group defined in service 10. 

In the case of a processing element 12 failure, 
the failure is detected by the Polling Control 22 
within Sync Master Task 20 (Figure 3b) and the 
same sequence as described above Is executed. 

If the departed member 18 was a distinguished 
member, Group Master Task 25 will first select a 
new distinguished member 18 and then proceed In 
the same manner as at)ove. 

The synchronization agents 14 are intennedlar- 
ies between synchronization master control 13 and 
the member agents 11. Synchronization agents 14 
are created and dispatched when their host pro- 
cessing element 12 is initialized. Upon creation 
they wait to be contacted by the synchronization 
master 13. if one exists. Any application level re- 
quests for synchronization are queued until an ac- 
knowledgement is received from synchronization 
master control 13. 

During nomnai operation, the synchronization 
agents 14 serve as a relay point for communication 
between the synchronization master control 13 and 
the member agents 11. All communication is buf- 
fered until acknowledged by the receiver so that 
the Member Agents 11 are protected from tem- 



porary failures of synchronization master control 

13. The synchronization agents 14 also extract and 
store any infonmation relevant to tfie reestabllsh- 
ment of the synchronization master control 13. 

5 Most of the operation of the synchronization 
master control 13 has already boen described 
above. The only aspect remaining is tfie monitoring 
function. 

The monitoring of tfie existence of processing 
10 elements 12 is done by ttie Polling Control 22 
whi<^ polls each individual synchronization agent 

14. The failure of a processing element 12 Implies 
ttiat tiie conresponding synchronization agent 14 Is 
down as well as all member agents 1 1 that were 

76 present on that processing element 12. When that 
happens the synchronization master control 13 no- 
tifies all affected Group Master Tasks 25. These, in 
turn, inform tiieir distinguished member agents 1 1 
which tiien broadcast tills Information to other 

20 member agents 1 1 , 

Before we go any further, it may be advanta- 
geous to Introduce the primitives used witti syn* 
cfironlzation service 10. The primitives can be split 
Into two categories: 

26 (1) Synchronous Primitives are in tfie form of 

request-reply pairs; member agents 11 submit re- 
quests for some action to be perfonned on their 
behalf and synchronization sendee 10 eventually 
matches these with appropriate replies. 

30 (2) Asynchronous Notifications are sponta- 

neous signals Informing a member agent 1 1 about 
changes In the status of Its group or conveying a 
message sent by some otfier member agent 11. 
There are only two types of asynchronous no- 

35 tifications tiiat dan be sent to a member agent 1 1 : 
-GROUP-CHANGE (group status) is sent when a 
new member 18 has joined or an active member 
18 has departed from the group. The status rn^ 
fomiation Includes the complete new membership 

40 list and the id of tiie new distinguished member. 
-GROUP-MSG (message) signals the anival of a 
message from some ottier member 18 (broadcast 
or point-to-point). 

The application program must allow fomis of 

45 communication (i-e. synchronous and asynchro- 
nous) although it may choose to handle asyn- 
chronous communications in a synchronous man- 
ner by Ignoring them until the current activity se- 
quence Is complete. 

so The synchronous primitives and conresponding 
replies are depicted In chart form in Rgure 3a, to 
which attention is directed. 
The primitives are: 
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•SYNCHRONIZE (group-id) 

This is diredtiva which is issirerf ty a member 
18 (via its member agpnt 11) vvbi»r it wishes to 
b3a>md 3yn(^ironfzEa± with^ tha grcop.spadfied by 
<group-id>. If no group exists at the time, one is 
established. The only signal expected in reply to 
this directive Is the SYNCH-DONE ^gnal. 



-SYNC-DONE (group-status) 

This is s aignali ftxm the synchronization ser* 
vice 10 (i.ei mambBr agosnt 11) In response to a 
successful: ^nchraniz^on of a nn^nber 18 follow- 
ing the invocation of the SYNCHRONIZE directive. 
The return p)arameter. <group status>» contains the 
same information about the status of the group as 
the GROUP-CHANGE primlth^ described below. If 
includes a <dm-flag> parameter which Infonms the 
member 18 if it Is the bearer of the distinguished 
member status. 



- UNSYNC 

This directive Is used when a member 18 de- 
cides to depart from its gra(^ It ensures orderly 
deactivation. 



-UNSYNC-DONE 

This signal is a confirmation that the member 
18 has been removed from its synchronization 
group. 



-GROUP-CHANGE (group-status) 

This is an asynchronous signal which is gen- 
erated by the member agent 11 whenever a new 
member 18 joins the group or when a member 18 
departs from the group. If this mmeber 18 is the 
new distinguished member as a result of the 
change, a <dm-flag> parameter in the <group- 
status> data record will be set appmpriately. The 
treatment of this situation is left to the application 
program. The new status of the group is also 
retumed. 



-REG-RIGHT (right-id» mode) 

This directive is issued when a member 18 
needs exclusive access to a group right. If the right 
is available, it is guaranteed to be granted to only 
one requesting member 18 (there may be multiple 



simultaneous requests for the same right). If the 
right is not available, then if the <mode> parameter 
spedfie a "queued" request it is queued until it 
can be serviced. Alternatively. If the <mode> pa- 
5 rameter spedRes "Immediate" the request Is re- 
fused smce the right has already been appropriated 
by another member 18 of the group. 

10 - R-GRANTED (righHd) 

This signal infonms a member 18 that it has 
been granted the required right 

75 

-R-REFUSED (right-id) 

This signal informs a member 18 which has 
requested a right with the "immediate reply" mode 
20 specified In the request that tiie right Is not avail- 
able. (If a queued request was made then this 
signal will never be generated.) 



25 -REL-RIGHT (right-Id) 

This directive is used to release an appropriat- 
ed right 

30 

-R-RELEASED (right-Id) 

This signal is the reply to the REL-RIGHT 
directive. 

35 

-QRY-RIGHTS 

This is a directive which is used to obtain a 
40 snapshot of the distribution of group rights among 
group members. 



-R-STATUS (rights-status) 

45 

This is a reply signal to the QRY-RIGHTS 
directive. The <rights-status> parameter lists, for 
each group right, the member-id of the member 
which owns It. It any. 
60 Note that service 10 cannot guarantee the cur- 
rency of the retumed infonmatlon since changes in 
the distribution of rights can occur at any time. 
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-QRP-BRDCSr (mafflags^ 

This dirsctive ia used, ta broadcast a synchro- 
nfzsdrorr event (nressHg^ ta all synchronized mem- 
tsfs t& It is thi& rsspssRsibllit^ of tt»e synchrongS' 6 
tion- service TQ (via member agent t1) to ensure 
that all memters 18 receive the message. The 
<message> parameter can be used to timestamp 
the synchronization event The higher level soft- 
ware is responsible for supplying this parameter as io 
YtQ^ as interpreting; Its functianal significance. 



-GROUP-AfiK 

16 

TMs is m adoTowtedgment signal for the 
GRP-BRDCST directive. It signifies that ail mem- 
bers 11 have received the latest broadcast mes- 
sage. 

20 

-SND-TO-MEM (message) 

This directive Is used to send a point-to-point 
message to another group member 18. 25 



-MSQ^ACK 

This is aiT acknowledgOTent that the latest 30 
point-to-point message has been received by the 
destination member 18. 

Before the invention is descril^ed further. It 
may t>e of value to give some brief examples of the 
application of the primitives. 35 

The first example is the control of a standby 
configuration. In this configuration there are two or 
more distributed program components (i-e. mem- 
bers 18) each on a different processing element 
12. each of which is equally capable of providing 40 
the necessary function. Only one should be active 
at any given time while the others are standing by, 
ready to be activated should the active one fall. 
Assuming that they are ail part of the same syn- 
chronization group that the algorithm which each 46 
member 18 executes is ttie same (the synchroniza- 
tion service primitives are highlighted in capitals): 
SYNCHRONIZE; 
Wait for SYNC-DONE signal: 

If not selected as the distinguished memt)er then so 
Repeat 

Usten for SYNC-CHANGE signals; 

until selected as the distinguished member 

Execute funciton; 

If a member 18 is not selected as the distin- 66 
guished member following synchronization with the 
group, then It simply waits until it is designated as 
the distinguished member. 



The next exampled concerns the updating of a 
replicated database, i.e. the same example men- 
tioned In the Background of the Invention. In this 
case there are multiple instances of a database, 
each of which can initiate an update request as a 
result of external activity. Such requests will be 
called extemal to distinguish them from "shadow 
requests*. Shadow request are copies of an exter- 
nal request which a member 18 sends to all other 
members 18 so that they can make the appropriate 
changes to their copies of the database. For brev- 
ity, the handling of any other requests except up- 
date requests is Ignored. 

The solution shown below uses the mutual 
exduslon fealure of the synchronization group. A 
right, called the Update right, is defined. The holder 
of this right is the member 18 whose request wili 
be honored; all other members 18 must withhold 
their requests and perform the shadow request 
sent by the holder of the right 



Solution A 

Repeat 
Wait for next request; 
If extemal request then 
begin 

REQ-RIGHT (update); 

While waiting for R-QRANTED 

Handle any incoming shadow requests; 

GRP-BRDCST (extemal request); 

Handle extemal request; 

REL-RIQHT (Update); 

end 

else 

Handle shadow request 
until termination; 

Note that the application program need not be 
concemed with spontaneous failures of other mem- 
bers 18 since that is handled by the synchroniza- 
tion service 10. 

An important problem which must be handled 
by this application (l*e. Solution A. above) is the 
addition of new or recovering instances. These will 
not necessarily have the same. state as the others 
and therefore must be brought to the same func- 
tional level. The situation is complicated by the 
possibility that updates may be initiated at other 
Instances whil the new instance is being upgraded. 
One method of dealing with this is for the new 
instance to appropriate the Update right to ensure 
that the state remains unchanged while It is being 
upgraded. The algorithm performed by a restarting 
instance is then: 
SYNCHRONIZE; 
Walt for SYNC-DONE signal; 
REQ-RIGHT (Update); 
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While waiting for R-GRANTED 
Discard any shadow rap^^ roECBivedt: 
Obtain cunant copy of d^atxasa^- 
REL-RIQHT (Updats)r 

Following tflf^ <hg nguii^j "^^^'^^F^Trrjl i^jj 5 

algorithm described^ abova (i-& Solution A) Is ex^ 
ecuted. 

The cunrant copy of the database Is obtained 
from any other member 18 through an Internal 
protocol using point-to-point messages (i-6- SND* io 
TO-MEM directfves). Instead of a copy of the entire 
database it may be mon& convenient t(i request si 
update log and then" perfonmr tha& " pn^gtg? missed 
while the member t8 instanco^wasEdawm. 

Rgure 4 is a functional- iUm dlaQramr c^weswrt- ts 
Ing the synchrbnizatioR servicff 10 database 16 
when accessed through the processing: element 12 
Identifier. 

The head and tall pointers (AQT-LST-HD and 
AGT-LST-TL) respectively, point to a linked fist of 20 
synchronization agent control blocks (tAGTCB) for 
those synchronization agents 14 Involved. 

There is one synchronization agent control 
block tAQT-CB for eadr processing element 12 
which requires synchronization; service 1(L It con- as 
tains a fink (AGT-LST-LNig: to other syndronte- 
tion agent control: blocks tWST-CB. Thie ch^ en- 
ables quick scanning- ofi affectmt processing e£e- 
ments 12 when an entire block of processing ele- 
ments 12 fails. Each synchronization agent control 30 
block tAGT-CB also contains a pointer (MMCB- 
LST-HD) to a chain of member agent control 
blocks OMEM-CB) which- redde on that processing 
element 12. Through this chain It Is possible to 
detect quickly all synchronization groups which are 35 
affected by the failure of a processing element 12. 
Whereas all other chains in the synchronization 
service database 16 remain unchanged once they 
are established, this chain follows the dynamics of 
member 18 joinings and departures. 40 

In order to simplify searching and list main- 
tenance, the last Sync Agent control block tAOT- 
CB In the list is a dummy block. 

Each member agent control block tMEM-CB 
corresponds to one member 18 of one synchro- 4s 
nization group. Among other data^ this control block 
contains a pointer (not shown^ in the diagmm) to the 
corresponding synchronization agent control block 
tAQT-CB. TTiis link allows quidc reconfiguring of 
the processing element-Member Agent chain when 50 
necessary. 

Rgure 5 is a functional flow diagram represent- 
ing the synchronization service database 18 when 
accessed through the unique group identifier. 

GRP-HDR [gropu id] is a static an^y of point- 55 
ers. Each item of the anray points to a circular list 
of member agent* control blocks (tMEM-CB of 
which belong to the same synchronization group. 



The member agent control blocks tMEM-CB 
are linked into a circular list to facilitate selection of 
a distinguished member. This list grows as mem- 
bers 18 are added to the synchronizafion group, 
each 8ucces8h/e block identified by the next avail- 
able positive integer (MEM-ID). This Integer cor- 
responds to the member 18 Identifier. 



ACTION SEQUENCES 

This section describes various action se- 
quences within synchronization service 10. A dia- 
grammatic representation Is used to show these 
sequences. The following conventions are used. 

• A full horizontal line indicates a message or 
rendezvous between two components (i.e. pro- 
grams or tasks): 

COHPl C0HP2 



Signal 

I 

where signal is the name of the entry procedure in 
component C0MP2 which accepts the signal. 
C0MP1 is the component which sent the signal. 

• a vertical line (1) following the reception of a 
signal indicates processing wittiin the appropriate 
component which received the signal. This pro- 
cessing results In one or more signals being dis- 
patched to other components: 





signal -1 


signal-2 


> 


<: 





• If an asterisk (^ appears next to a signal it 
implies that the signal may be repeated (to dif- 
ferent destinations). 

• A signal which is enclosed in braces 
(e.g..<signal> indicates tiiat the signal is not man- 
datory and may be omitted depending on circum- 
stances. 
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• Bradceted rmmbsrff in the- Rguras (&.g. (t)) 
dsslgnala explan^ory notsa which canton textual 
descriptions pslainlng to vaious signals. The 
notes are given in tfre^ tGMt rel^ng to ttre relev^ 
Bgum. A3 It is bs^iafst thst this Rguiss eea S8f^ 5 
explanatory, only brief uunimai t a w^it. be made 
regarding the figures. 



Joining of New Members 

An "empty" synchroniz^on group is one in 
which no members 18 are^ active. When the' first 
member t8 foins, it Is d esignat ed m the drsttn* 
guished membo* by default. Rgure 6 depicts a ts 
member t8 Joining an ©npty group. The sequence 
of events is as depicted in Rgure 6. to which 
attention is directed. The abbreviations used in the 
Rgures are as follows: APPL means an application 
task; SNYC AGT means the synchronization agent 20 
task; MEM AGT means the member agent task; 
SYNC MST menas the synchronization master task 
20; QRP MST means the Group Master Tast 25; 
MEM AGT (DM) means the distinguished member 
agent task; APPL (DM) means the appticatfon task 2s 
which comssponds to the: dstinguished member 
agent 

The followmg notes r^ to the bracketed num- 
bers In Rgure 8. 

Notes: 30 

(1) The Sync Agent will create a member 
agent task only if It had not existed prevrously, 
otherwise it will REINIT a previously created task 
Instance. Tiie START-AGT signal which fbltows ini- 
tialization is used to pass initial data to the member 3s 
agent 11. 

(2) The AQT-MST-MSG signal to the sync 
master 13 contains the complete information about 
ail member agente 11 on this processing element 

12, including the newly-created member 18. (Tiiis 4o 
ensures state convergence even In the presence of 
design faults.) The MST-REPLY message is used 
for positive acknowledgement so that the Sync 
Agent can send the next message to the Sync 
Master if it has one. (Only one outstanding mes- 46 
sage is allowed between the Sync Master and a 
Sync Agent) 

(3) If this group has not previously existed, a 
new Group Master is aeated. tn that case a STAR- 
TUP signal follows to pass initial data to the (^roup 50 
Master Task, and an ACTIVATE signal is used to 
force it into an operational mode. If tlie group had 
existed previously (but had lost all its members) 

tiie existing Group Master Task is used. 

(4) Once the Group Master has been ac- ss 
tlvated. a QRP-EVENT dgnal Is sent by the Sync 
Master informing It of the joining of the first mem- 
ber. 



(5) Upon receipt of the GRP-EVENT signal, 
the Group Master selects the newty-created mem- 
ber agent 11 as the distinguisiisd member and 
sends It a GRP-STATE signal This signal estab- 
^hes a connectkin between the (3mup Master svd 
the C^stingulshad Member. All subsequent GRP- 
STATE signals are sequenced to ensure proper 
event ordering as well as to guard against commu- 
nication failures. The GRP-REPLY signal is used to 
acknowledge one or more GRP-STATE signals and 
provides reliable communication. 

The GRP-STATE signal contains the complete new 
state of the group rather than Just Information about 
the changes. This ensures that the system will 
converge to the true state even in the pres^ice of 
design faults. 

(6) The CHECK-FAIL signal is used to poll 
the application task to detect unexpected failures of 
the application. The application task never receives 
this signal; however, shoukl the task fall, the mem- 
ber agent task will be notified by the underlying 
operating system kernel. 

(7) The SYNC-REPLY signal contains a reply 
code of SYNC-DONE 

Rgure 7 depicts the sequence for Joining an exist- 
ing synchronization group; the Group Master Task 
already exists, and the dlstingutshed member Is 
used to notify (broadcast) the other members 18 of 
the presence of a new member 18 
Notes: 

(8) If the application has so requested, a 
GROUP-CHANGE signal is sent by each member 
agent 11 to the application whenever it detects a 
change in status of the group. In the case of the 
distinguished member the status change Is gleaned 
from the GRP-STATE signal. 

(9) When the distinguished member receives 
a GRP-STATE signal which indicates a group 
change (not all do). K broadcasts the new state to 
all other group members 18 using a MEM-MSG 
signal. Each member 18 acknowledges such mes- 
sages with an ACK signal to provide reliable com- 
munication. 

(10) When the newly-joining member 18 re- 
ceives its MEM-MSG from the distinguished mem- 
ber it will send a SYNC-REPLY signal to the ap- 
plication (instead of a GROUP-CHANGE signal). 
The control flow for the departure of a member 18 
is shown in Rgure 8. Note ttiat the case of a 
processing element 12 failure is not shown here 
but is Instead treated separately. 

Notes: 

(11) The sequence shown here corresponds 
to a voluntary departure Initiated by ttie application 
program issuing an UNSYNC directive. This results 
In the member agent task terminating which, in 
turn, sends a COMPLETE signal to the parent task, 
the Sync Agent. The sequence is simitar in situ- 
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ations where the departure is not voluntary: 
-When the appflcation task fails, the member agent 
tt \s notiffed (through the failure of the CHECK- 
FAIL message) which results tn the termination of 
the Bismber agent task (and consequently, raising 5 
of the COMPLETE signal). 
•If the member agent task Itself falls, the Sync 
Agent is notified by the operating system kernel 
with a COMPLETE signal. 

(12) Indicates a <GROUP-CHANGE> signal to 
to an appHcation task not shown in the Figure. 

Rgures 9a and 9b together depict the recovery of 

th& synchronization master control 13 0-9- Sync 

Master). The most general case is considered. I.e. 

the case of a running synchronization service 10 is 

with active synchronisation groups. This includes, 

as a subset, the case of a "cold" start 

Notes: 

(13) Upon activation. Sync Master sends an 
ACTIVATE signal to each configured Sync Agent. 20 
The Sync Agent whether they are already active or 

not will respond with an AGT-REPLY message 
which includes a Dst of all member agents sup- 
ported on that processing element (The MST-RE- 
PLY signal is used for acknowledgement refer to 2s 
Figure 6.) 

(14) Following activation of the Sync Agent 
the Polling Control subcomponent 22 of Sync Mas- 
ter Control sends a POLL-AGT message to which 

the Sync Agent responds with an AGT-REPLY 30 
message. This exchange Is repeated periodically to 
detect outages of the processing element 

(15) A Group Master is Initiatad only the first 
time a group Is encountered. Refer to Figure 6 for 
further details on Group Master initiation. 35 

(16) Before activating a Group Master, It is 
provided with data regarding the status of its mem- 
bers through GRP-DATA signals. Each signal con- 
tains the information for one member agent of one 
group. (This info Is obtained from the AGT-REPLY 4o 
messages.) In this way. the Group Master recon- 
structs the status of its group. 

(17) After all Syrn: Agents have responded, 
the reconstruction is complete and an ACTIVATE 
signal is sent to ail Group Masters. The Group 45 
Masters respond by sending a GRP-STATE signal 

to all distinguished members. Since this signal 
contains the complete group state, any group 
changes that might have occunred while the Sync 
Master was down are detected. so 
Rgure 10 depicts the procedure for handling the 
failure of a processing element 12 which contains a 
synchronization agent 14 (Sync Agent). 
Notes: 

(18) The Sync Master detects a failure of a 55 
processing element when a TIME_OUT event is 
received. This means that a Sync Agent has not 
responded to a poll. 



(19) For each group affected by the process- 
ing element failure, the Sync Master will send a 
GRP-EVENT signal to the respective Group Mas- 
ter. 

Figure 11 depicts the procedure for the recovery of 
a processing element 12 which is part of syndtro- 
nization service 10. Note that the recovery of pro- 
cessing element 12 does not extend to recovering 
member agent 11 tasks. It Is assumed that these 
will be recovered when the application tasks (i.e. 
programs) which use them are restarted. Thus, the 
only action to recover a process element 12 is to 
integrate the sync agent 14 with the rest of syn- 
chronization service 10. 
Notes: 

(20) When a previously failed Sync Agent 
finally responds to a POLL_AGT signal, the Sync 
Master Initiates the recovery procedure. 

(21) The Sync Master registers the new pnj- 
cessing element 12 and sends an ACTIVATE sig- 
nal to the Sync Agent on that processing element 
12 (Refer to Rgure 9a for a more detailed descrip- 
tion of the activation sequence). 

Figure 12 depicts the procedure for member 18 to 
member 18 messages. This procedure (protocol) is 
used both for group broadcasts and poinHo-poInt 
messages between members 18. 
Notes: 

(22) In case of a broadcast a copy of the 
message is sent to each member 18. If the mem- 
ber 18 Is not active, the message is not sent 

(23) Upon receiving a MEIVhMSG signal 
which indicates an application-level message the 
message is relayed to the application task respon- 
sible for receiving asynchronous messages. 

(24) A SYNC-REPLY (codes: GRP-ACK or 
MSG__ACK) signal is sent back to the originator. 
Figure 1 depicts the procedure employed in the 
processing of all directives which require distin- 
guished member intervention (rights handling direc- 
tives). 

Notes: 

(25) If the request is made on the distin- 
guished member site, then no message is sent 

(26) A reply signal (MEM-MSG followed by a 
SYNC-REPLY to the application). 

In one implementation made by the Inventors, 
the code for synchronization service 10 was con- 
tained in ten files which were distributed into six 
units, the useage dependencies (and compilation 
order) of which are shown in Figure 14. 

Notes: 

SYNCCTRL contains the stub of the Sync Master 
unit and directives to include three files 
(SYNCMST, SYNCGMST, and SYNCPOU) which 
implement the Sync Master function. It also con- 
tains the definitions required for the master 
database 16. SYNCMST is an "include" file which 
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oontalns the code for the Sync Master Task* 
SYNCGMST Is an 'include' file which contains the 
code for the Group Master Task. 
SYNCPOLL contains the Polling Control 22. 
St^NCLOCL contains the stub of the s 
unit as well as a definftJon of data and procedure 
objects shared by the Sync Agent Task and the 
Member Agent Tasks. It also contains directives to 
Include two files (SYNCAGT, SYNCMAGT). 
SYNCRESX contains the definition of the SYN- ro 
CHRONIZE primitive. This unit must be loaded with 
the application code which uses tha synchronizar 
tion service. 

SYNCDEFI contains a set of Internal compile-objTCt 
definitions for the synchronization service. This in- ts 
dudes the definition of all entries used for commu- 
nication between syndironization service compo- 
nents which are hidden from user programs. Since 
this file contains only compile objects it is not 
kiaded. 20 
SYNCDEFX contains a definition of ail compile- 
objects which are exported by the synchronization 
servie to its users. Since this file contains only 
compile objects It is not loaded. 
Application tasks which use the synch ron i zatto n 25 
service need to Include SYNCRESX and SYNC- 
DEFX in their useage lists. 

Note that SYNCCTRI implements tte function 
of synchronization master control 13 (Rg. 1). mi 
that SYNCLOCL and SYNCRESX together imple- 30 
ment the functions of member agent 11 and syn- 
chronization agent 14 (Rg. 1). The files SYNCDER 
and SYNCDEFX are not resident In the service; 
they can be thought of as tools used in the con- 
struction of the synchronization service but they 35 
are not themselves a part of it 

Simplified pseudocode listings for the main 
constituents of the invention follow as appendix I. 
They are believed to be setf-explanatory. Any 
elaboration of the material is accomplished through 4o 
the use of appended notes, to which attention Is 
directed. 

As a further aid to the understanding and to the 
use of the present invention the following (a copy 
of a "User's Reference" to the synchronization 46 
service of the present invention, as prepared by 
one of the inventors) is included as Appendix It. It 
will expand on the use of the present invention. 



Claims 

1. A synchronization service [10] for use with a 
computer having a distributed operating system, to 
allow the construction of a customized synchroniza- 55 
tion scheme, for synchronizing the oinstituent por- 
tions of a distributed program, said service com- 
prising: 



a ^eral set of appfication-independent synchro- 
nization primitives, whereby the construction of 
said customized .synchronization scheme is 
achieved by the selective implementation of said 
appBcatbiHndepemlertt synchronization primitives. 

2. The synchronization service of claim 1 
wherein said application-independent primitives 
comprise the following functions: synchronize; syr>- 
chronlze done; and unsynchronize. 

3. The synchronization service of daim 2 
wherein said primitives further comprise the fbllow- 
ing functions: request right right granted; ri^ re- 
fused; release right; group broadcast; and group 
acknowledge. 

4. The synchronization service of claim 3 
wherein said primitives further comprise the follow- 
ing functions: unsynchronize done; send to mem- 
ber; and message acknowledge. 

5. A synchronization service [10] for use with a 
computer having an operating system [15] distrib- 
uted over a plurality of processing elements [12], to 
allow the construction of a customized synchroniza- 
tion scheme, for synchronizing the constituent por- 
tions [18] of a distributed program, said service 
comprising: 

a common synchronization master control nteans 
[13]; 

a synchronization agent means [14] for each pro- 
cessing element; 

a plurality of application program components [18], 
each component k)cated on a different processing 
element each said component having associated 
therewith a member agent [11], said member agent 
being a program for Interfacing with said synchro- 
nization agent means, and said synchronization 
agent means Interfacing between said master con- 
trol means and said member agent, whereby a 
customized synciironization scheme can be con- 
structed based upon a general set of applic^on- 
independent synchronization primitives contained 
in both said synchronization agent means [14] and 
said member agent [11] and accessed via said 
synchronization agent means. 

8. The syndironization service of claim 5 
wherein said application-Independent primitives 
comprise the following functions: synchronize; syn- 
chronize done; and unsynchronize. 

7, The synchronization service of claim 6 
wherein said application-independent primitives fur- 
tiier comprise the following functions: request right; 
right granted; right refused; release right; group 
broadcast; and group acknowledge. 

8. A synchronization service [10] for use with a 
computer having an operating system [15] distrib- 
uted over a plurality of processing elements, to 
allow the construction of a customized synchroniza- 
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tlon scheme, for synchronizing the constituent 
component of a distributm^ program, said service 
[lai comprising, ths steps at 

a) Jolnini^ a proi^mn': compon^ [tS[ on a 
fusl! uGBS^n^ ejeoTTTSitt Elt^ tti a graup cf esdsting 5 
program oo mponen ta - [TSE an> at feast a second 
processing element [12] so that each of the exist- 
ing components Is awara of the presence and 
location of the joining components; 

b) informing each member of a group of ro 
physically distributed program components when 

one^ or mam conrponents which are members of 
sstd groipi. depart from^ i^ 

c) selecting,, asr a dstfnguished membetr one 
progrsn componastt fimt a group of distributed rs 
program components such that, within said group, 
there Is never mors than one said disfinguished 
member; and 

d) providing mutuail^c exclusive rights to said 
group of distributed? pragramr cornponents such that zo 
no more than one said component can appropriate 

a given right at any time. 

9. The synchronization service of claim 8 fur- 
ther including the step of providing reliable point- 
to-point communication between said distnliuted 25 
program components or the toasts of their internal 
group idsitifters. 

tOi The- synchronizatton service of cldm 9 fur- 
ther including tha step of providing a broadcast 
mechanism from any one program component to 30 
all other program components which are currently 
declared as being irt the same group as the broad- 
casting component 

11. The synchronization service of daim 10 
wherein said program components are components 35 
of an application program. 

12, The synchronization service of claim 10 
wherein said program components are components 
of an operating system program. 

13, The synchronization service of claim 8 40 
wherein said physical processing elements are 
logically distributed entities at one physical loca- 
tion. 

14. A synchronization service {10], for use with 

a computer having an operating system [15] dis- 45 
tributed over a plurality of processing elements 
(12], to allow the construction of customized syn- 
chronization schemes for synchronizing the con- 
stituent components [18] of a distributed program, 
said service comprising, as required, the steps of: 50 

a) establishing a synchronization group for 
said distributed program, said gmup compriang at 
least one distributed program component [18]; 

b) joining a program component [18] to said 
group of existing program components so that 55 
each of the components is aware of the presence 

and the location of ail the other components in said 
group; 



c) infonning each memtjer of said grtHjp of 
cfistrftnited program components when one or more 
components which are members of said "group, 
depart from it 

d) selecting, as a distinguished member for 
safd group, one program component from said 
group of distributed program components such 
that, within said group, there is never more than 
one said distinguished member, and 

©) providing mutually exclusive rights to said 
group of distributed program components such that 
no more than one said component can appropriate 
a given right at any time. 

15. The synchronization service of claim 14 
further including the step of providing full connec- 
tivity between all said distributed program compo- 
nents of said group. 

16. The synchronization service of claim 15 
wherein said distributed program is an application 
program. 

17. The synchronization service of claim 15 
wherein said distributed program is an operating 
system program. 

18. The synchronization service of daim 15 
wherein each said program component [18] is on a 
different processing element [12], 

19. A synchronization service [10], for use with 
a computer having an operating system [15] dis- 
tributed over a plurality of processing elements 
[12], to allow the construction of customized syn- 
chronization schemes for synchronizing ttie con- 
stituent components [18] of a distributed program, 
said service Including a synchronization master 
control [13] comprising: 

master control means [21] for activating said syn- 
chronization service; 

polling means [22] for polling the processing ele- 
ments [12] associated with said components of 
said distributed program so as to monitor tiie sta- 
tus of said processing element; 
control means [24] for joining new members [18] to 
said group, and for tiandiing departures of mem- 
bers [18] from said group; and 
a database means [16] containing infonmation re- 
presentative of the current state of said synchro- 
nization service at a given point in time. 

20. The synchronization service of claim 19 
further including, at each said processing element 
a synchronization agent [14] comprising: 

means for accepting synchronization directives and 
for creating corresponding member agents; and 
means for monitoring tiie status of all active mem- 
ber agents on said processing element and report- 
ing same to said synchronization master control 
[13]. 

21. The synchronization service of daim 20 
further including at each said processing element 
[12], a member agent [11] each synchronization 
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group, comprfsing; 

communications means [33] for providing a reliable 
communications service between program compo- 
nents [181; 

^age- means [32] for mainteinJng a local version 5 
of the current state of all other program compo- 
nents [18]; 

handler means [31] for providing the interface be- 
tween user tasks and said member agent [11]; and 
distinguished member means [30] for Implementing 70 
the distinguished member fonctlon on only one 
program component [18] at any gh^en time. 
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